A genetic programming-based approach to identify potential inhibitors of serine protease of Mycobacterium tuberculosis
Abstract
Aim: We applied genetic programming approaches to understand the impact of descriptors on inhibitory effects of serine protease inhibitors of Mycobacterium tuberculosis (Mtb) and the discovery of new inhibitors as drug candidates. Materials & methods: The experimental dataset of serine protease inhibitors of Mtb descriptors was optimized by genetic algorithm (GA) along with the correlation-based feature selection (CFS) in order to develop predictive models using machine-learning algorithms. The best model was deployed on a library of 918 phytochemical compounds to screen potential serine protease inhibitors of Mtb. Quality and performance of the predictive models were evaluated using various standard statistical parameters. Result: The best random forest model with CFS-GA screened 126 anti-tubercular agents out of 918 phytochemical compounds. Also, genetic programing symbolic classification method is optimized descriptors and developed an equation for mathematical models. Conclusion: The use of CFS-GA with random forest-enhanced classification accuracy and predicted new serine protease inhibitors of Mtb, which can be used for better drug development against tuberculosis.
Papers of special note have been highlighted as: •• of considerable interest
References
- 1. Inhibition of Mycobacterium tuberculosis secretory serine protease blocks bacterial multiplication both in axenic culture and in human macrophages. Scand. J. Infect. Dis. 41(8), 569–576 (2009).
- 2. . A membrane protein preserves intrabacterial pH in intraphagosomal Mycobacterium tuberculosis. Nat. Med. 14(8), 849–854 (2008). •• Demonstrates that serine protease Mycobacterium tuberculosis plays a critical role in virulence and survival.
- 3. . Acid resistance in Mycobacterium tuberculosis. J. Bacteriol. 191(15), 4714–4721 (2009). •• Demonstrates that serine protease M. tuberculosis plays a critical role in virulence and survival.
- 4. . Genetic programming: on the programming of computers by means of natural selection. Stat. Comput. 4(2), 87–112 (1994).
- 5. . Genetic programming for QSAR investigation of docking energy. Appl. Soft. Comput. 10, 170–182 (2010).
- 6. . Modelling the effect of structural QSAR parameters on skin penetration using genetic programming. Adv. Nat. Sci. 1, 035003–035010 (2010).
- 7. . QSAR study of anti-HIV HEPT analogues based on multi-objective genetic programming and counter-propagation neural network. Chemometr. Intell. Lab. Syst. 83(2), 91–98 (2006).
- 8. . Genetic programming in data mining for drug discovery. Evol. Comput. Data Min. 10(163), 211–235 (2004).
- 9. . Bloat free genetic programming: application to human oral bioavailability prediction. Int. J. Data Min. Bioinform. 6(6), 585–601 (2012).
- 10. . Evaluation of mutual information and genetic programming for feature selection in QSAR. J. Chem. Inf. Comput. Sci. 44(5), 1686–1692 (2004).
- 11. . Mining HIV protease cleavage data using genetic programming with a sum-product function. Bioinformatics 20(18), 3398–3405 (2004).
- 12. . Learning to play the game of chess. In: Advances in Neural Information Processing Systems MIT Press, MA, USA (1995).
- 13. . Adaptation in Natural and Artificial Systems. University of Michigan Press, MI, USA (1975).
- 14. . Genetic algorithms and their use in chemistry. In: Reviews in Computational Chemistry. Lipkowitz BKBoyd BD (Eds). Wiley, New York, USA, 10, 1–73 (1997).
- 15. . Principles of QSAR and drug design. In: Genetic Algorithms in Molecular Modeling (Volume 1). Academic Press, Harcourt Brace & Company, NY, USA (1996).
- 16. . Theory of Psychological Measurement. McGraw-Hill, NY, USA (1964).
- 17. . Random forests. Mach. Learn. 45, 5–32 (2001).
- 18. . Induction of decision trees. Mach. Learn. 1, 81–106 (1986).
- 19. . Bayesian network classifiers. Mach. Learn. 29, 131–163 (1997).
- 20. . Sequential minimal optimization: a fast algorithm for training support vector machines. Microsoft Res. MSR-TR-98-14, 21 (1998).
- 21. National Center for Biotechnology Information. Fluorescence polarization-based biochemical high throughput confirmation assay for inhibitors of the membrane-associated serine protease Rv3671c in M. tuberculosis. http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=2761
- 22. . Automatic generation of 3D-atomic coordinates for organic molecules. Tetrahedron Comput. Methodol. 3, 537–547 (1990).
- 23. . PowerMV: a software environment for molecular viewing, descriptor generation, data analysis and hit evaluation. J. Chem. Inf. Model. 45(2), 515–522 (2005).
- 24. . Theory of Psychological Measurement. McGraw Hill, NY, USA (1964).
- 25. . A study of cross-validation and bootstrap for accuracy estimation and model selection. Proc. 14th Int. Conf. 2(12), 1137–1145 (1995).
- 26. . Selecting and interpreting measures of thematic classification accuracy. Remote Sens. Environ. 62, 77–89 (1997).
- 27. . Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006).
- 28. . A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45, 427–437 (2009).
- 29. . An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–887 (2006).
- 30. . In-silico prediction of anti-malarial hit molecules based on machine learning methods. Int. J. Comput. Biol. Drug Des. 8(1), 40–53 (2015).